Industrial Parsing of Software Manuals

ثبت نشده
چکیده

of the parser in recognising the prescribed features. At various points, decisions had to be made about what counted as a given feature. We have taken the category of verb to include auxiliary verbs, and also present and past participles where these are constituents of a verb phrase but not otherwise. Thus \the sun is rising" contains two verbs, while \the rising sun" contains none. In the category of noun we have included: verb nouns such as \hunt-ing" in a phrase like \the hunting of the snark", proper nouns, and also phrases like \Edit/Cut" in sentence L24. We have excluded pronouns (though these have the same category as noun in our parsing scheme) and also nouns used to modify other nouns such as \source" in \source sentence". We have taken compounds to mean strings, mainly names, which are indivisible units from the grammatical point of view but are written with white space between them; examples in the IPSM test set are \Word for Windows 6.0" (T2) and \Translate Until Next Fuzzy Match" (T8). ALICE attempts to identify such phrases in preprocessing, using capitalisation as the main clue. A number of questions arise in evaluating the analysis of phrase boundaries. Our categorial formalism commits us to a subject-predicate analysis which is equivalent to a set of PSG rules of binary form only, so that for instance a conjunction of sentences has to be analysed as the application of \and" to one sentence to produce a modiier of the other sentence; but from a grammatical point of view it is an arbitrary choice which sentence is seen as modifying the other. Other examples of arbitrary choices involve attachment, where very often there is no semantic diierence between diierent ways of bracketing a sequence of noun or verb modiiers (the problem of \spurious ambiguity"). It is thus not possible to compare an actual parse with an ideal parse, because there is often more than one correct way of parsing a sentence. Furthermore, our formalism commits us to deep rather than shallow nesting of phrases inside each other, so that errors are very liable to be propagated upwards: for instance, where a verb fails to attach to one of three arguments, the three phrases in which the argument should have been nested will all be wrongly analysed. We have therefore chosen to assign scores to boundaries between words. Any two neighbouring words will belong …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Industrial Parsing of Software Manuals

This book is a collection of articles written by research teams (eight from seven countries) that participated in the workshop Industrial Parsing of Software Manuals, held at the University of Limerick, Ireland, in 1995. However, unlike a typical proceedings volume, the book has a strong unifying theme: reporting the behavior and measuring the performance of a collection of parsing systems on a...

متن کامل

Parsing Computer Manuals using a Robust

In this paper, we describe the basic Alvey Natural Language Toolkit and a set of modiications we have made to it to enhance its robustness. Following this, we report on a series of experiments that show the performance of the robust ANLT for tasks that involve the parsing of software manuals. The main ndings, with respect to the robust ANLT, are that the shorter the sentence, the greater the ch...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Creating semantic relations for user manuals on Semantic Web using SKOS

The aim of the paper is to create a standard model representation using SKOS for user manuals to benefit from an automatic mechanism of information and explanations representation necessary for each software applications. The SKOS (Simple Knowledge Organization Systems) is a standard for representing concept shemes for different types of KOS (Knowledge Organizations Systems). This method of kno...

متن کامل

Reducing the Complexity of Parsing by a Method of Decomposition

The complexity of parsing English sentences can be reduced by decomposing the prob lem into three subtasks Declarative sentences can almost always be segmented into three concatenated sections pre subject subject predicate Other constituents such as clauses phrases noun groups are contained within these segments but do not normally cross the boundaries between them Though a constituent in one s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995